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ABSTRACT 



D. Wood and J. Erskine (1976) and B. Thompson (1989) 
provided bibliographies of roughly 130 applications of canonical correlation 
analysis, but the features of such reports have not been widely studied. This 
report examines the features of recent canonical reports, including 
substantive inquiries, but also measurement applications examining 
multivariate validity and multivariate reliability. One particular area of 
interest focuses on interpretation of functions as against structure 
coefficients. Little appears to have changed since the publication of the 
Wood and Erskine study. The current review of the literature yields similar 
results about the confusing and somewhat arbitrary use of canonical 
terminology. Several analyses are highlighted that illustrate why students 
have so much trouble understanding canonical results. In addition to using 
confusing terminology, many authors failed to provide all the information 
needed to evaluate their conclusions. Recommendations for reporting canonical 
results include evaluating both the squared canonical correlation 
coefficients and statistical significance test results to decide which 
canonical functions to interpret. Both the canonical function coefficients 
and the canonical structure coefficients should be interpreted for noteworthy 
functions. One should usually not try to interpret the redundancy 
coefficients. One must, however, examine the communality coefficients for the 
variables that do not contribute to the overall canonical correlation 
solution, and one should evaluate the generalizability of the results through 
statistical or empirical means. Measurement applications are outlined. 
(Contains 1 table and 51 references.) ( SLD) 



******************************************************************************** 

* Reproductions supplied by EDRS are the best that can be made * 

* from the original document. * 

******************************************************************************** 



er|c 



TM028219 



Canonical Results 1 



<N 



00 



s 



Running head: FEATURES OF PUBLISHED ANALYSES OF CANONICAL RESULTS 



Features of Published Analyses of Canonical Results 
Terresa M. Humphries-Wadsworth 
Texas A&M University 77843-4225 



PERMISSION TO REPRODUCE AND 
DISSEMINATE THIS MATERIAL 
HAS BEEN GRANTED BY 






TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) 



U S DEPARTMENT OF EDUCATION 
Office of Educational Research and Improvement 

Z TIONAL RESOURCES INFORMATION 
CENTER (ERIC) 

document has been reproduced as 
received from the person or organization 
originating it. 

□ Minor changes have been made to 
improve reproduction quality. 



• Points of view or opinions stated in this 
document do not necessarily represent 
official OERI position or policy. 



Paper presented at the annual meeting of the American Educational 
Research Association, San Diego, CA, April 13, 199$. 



O 



2 



BEST COPY AVAILABLE 



Canonical Results 2 



Abstract 

Wood and Erskine (1976) and Thompson (1989) provided 
bibliographies of roughly 130 applications of canonical 
correlation analysis, but the features of such reports have not 
been widely studied. The present study examines the features of 
recent canonical reports, including substantive inquiries, but 
also measurement applications examining multivariate validity and 
multivariate reliability. One particular area of interest focuses 
on interpretation of function as against structure coefficients. 
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Features of Published Analyses 
of Canonical Results 

Hinkle, Wiersma, and Jurs (1979, p. 415) noted some twenty 
years ago that it was "increasingly important for behavioral 
scientists to understand multivariate procedures even if they do 
not use them in their own research." Similarly, Grimm and 
Yarnold (1995) recently noted that, "In the last 20 years, the 
use of multivariate statistics has become commonplace. Indeed, 
it is difficult to find empirically based articles that do not 
use one or another multivariate analysis" (p. vii) . Thus, 

Emmons, Stallings, and Layne (1990) conducted an empirical study 
of 16 years of research reports in three journals, and found that 
the multivariate characteristic of the social science 
research environment with its many confounding or 
intervening variables has been addressed through the 
trend toward increased use of multivariate analysis of 
variance and covariance, multiple regression, and 
multiple correlation. (p. 14) 

There were, and continue to be, good reasons for this trend. 
First, multivariate analyses limit the inflation of the Type I 
"experimentwise" error rates which can occur when a researcher 
conducts multiple univariate analyses. Second, multivariate 
methods honor 'real life' complexities "in which most outcomes 
have multiple causes, and in which most causes have multiple 
effects" (Thompson, 1986, p. 9) . The general linear model (GLM) 
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allows a researcher to investigate relationships between 
potential causes (independent variables) and observed effects 
(dependent variables) . 

The GLM "produces an equation that minimizes the mean 
differences of independent variables as they are related to a 
dependent variable" (Vidal, 1997) . The most general case of the 
parametric GLM is a structural model in structural equation 
modeling, which subsumes canonical correlation analysis as a 
special case (Bagozzi, Fornell, & Larcker, 1981; Dawson, 1998; 
Fan, 1996, 1997) . Canonical correlation analysis, in turn, 
subsumes all other parametric multivariate analyses and 
regression as special cases (Baggaley, 1981; Thompson, 1991a), 
while regression subsumes all the univariate methods as special 
cases (Cohen, 1968) . As Knapp (1978) noted, "Virtually all of 
the commonly encountered parametric tests of significance can be 
treated as special cases of canonical correlation analysis" (p. 
410) . Therefore, canonical correlation analysis is the logical 
choice for examination, if one wishes to understand classical 
parametric multivariate (and univariate for that matter) analysis 
procedures . 

Canonical correlation analysis can be thought of (in 
somewhat simplistic terms) as a bivariate correlation between two 
sets of synthetic or latent variables (Thompson, in press-a) . 

The principle aim of canonical correlation analysis is to find a 
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linear combination of the variables in one set that correlates 
maximally with the linear combination of variables in the second 
set. (It is possible to conduct canonical correlation with more 
than two sets of variables, however, for clarity sake let us 
focus on the bivariate analog) . In order to accomplish this 
maximal correlation, CCA computes weights called canonical 
functions (analogous to beta weights in multiple regression) . In 
fact, as Thompson explained. 

These weights are all analogous, but are given 
different names in different analyses (e.g., beta 
weights in regression, pattern coefficients in 
factor analysis, discriminant function coefficients 
in discriminant analysis, and canonical function 
coefficients in canonical correlation analysis) , 
mainly to obfuscate the commonalities of parametric 
methods, and to confuse graduate students. 

(Thompson, 1995, p. 87) 

In a similar effort to confuse the graduate students, the 
analogous systems of these weights are arbitrarily given 
different names (e.g., "equation," "factor," "function," "rule"), 
and so too the analogous synthetic/latent variables derived by 
applying the weights to measured/observed variables are 
arbitrarily given different names (e.g., "Yhat, " "factor scores," 
"discriminant function scores," or "canonical function scores"). 
Table 1 summarizes the panoply of confusing jargon (one is 
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reminded of one of the old Bob Newhart television shows, where 
one character regularly notes, "Hi, I'm Darrell, and this is my 
other brother, Darrell") . 

The number of canonical functions that can be computed is 
equal to the number of variables in the smaller of the two 
variable sets (Thompson, 1991b) . The canonical function 
coefficients are then applied to an individual's set of measured 
or observed scores (which have been converted to a standard 
metric, i.e., z. scores) producing a "synthetic" variable score 
(Thompson, in press-a) . The synthetic variable score is the 
focus of all statistical analysis, and is an estimate of the 
latent construct actually of interest in all statistical 
analyses. 

Canonical correlation analysis is a rich analytic tool for 
examining multiple dimensions of the synthetic variable 
relationships. Thus, Wood and Erskine (1976) and Thompson (1989) 
provided bibliographies of roughly 130 applications of canonical 
correlation analysis. The researcher is able to examine the 
relationships between the measured variables (within a set) and 
the synthetic variable scores within a given function through two 
avenues, examining the standardized function coefficients and the 
structure coef f icients . 

As noted previously, the standardized canonical function 
coefficient is the analog of the beta weight in regression, and 
since many regression researchers erroneously interpret their 
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results by consulting only the beta weights (Thompson, 1997; 
Thompson & Borrello, 1985) , one might expect many canonical 
researchers to make this same mistake. However, "if the 
variables within each set are moderately intercorrelated, the 
possibility of interpreting the canonical variates by inspection 
of the appropriate regression weights [function coefficients] is 
practically nil" (Meredith, 1964, p. 55) . Structure coefficients 
must also be consulted since they allow a researcher to interpret 
the canonical variates even when the variables are 
intercorrelated. 

A structure coefficient is the "bivariate product-moment 
correlation between the scores on an observed or measured 
variable and scores on a synthetic or latent variable" (Thompson, 
in press-a) for the given variable set. Structure coefficients 
inform the researcher about the contribution of each measured 
variable to the construction of the function. Squared structure 
coefficients represent the proportion of variance shared by a 
variable and the variable's canonical composite. Inspecting the 
relative contributions of the variables allows the researcher to 
interpret and understand the latent/synthetic scores on the given 
function. 

Through inspection of both the standardized function 
coefficients and the structure coefficients for a given function, 
one is able to identify those variables which (a) contribute 
nothing to the understanding of the relationship between the 
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variable sets ( both have near-zero structure coefficients and 
near-zero standardized function coefficients) ; (b) are 

arbitrarily denied credit for their predictive contributions 
(have a near-zero function coefficient and a large structure 
coefficient, i.e. approaching -1 or +1); (c) are demonstrating 

suppression effects (standardized function coefficients with a 
large absolute value and near-zero structure coefficients) ; and 
(d) are perfectly uncorrelated (in this unusual case, both the 
function and structure coefficients are identical — see Thompson, 
1984) . Clearly one can miss a great deal of the information 
provided in the canonical correlation analysis if one fails to 
examine both structure and standardized function coefficients in 
the results. 

As mentioned earlier, canonical correlation analysis is a 
rich analytic tool for examining multiple dimensions of the 
synthetic variable relationships. In addition to the standardized 
function coefficients and the structure coefficients, three other 
coefficients are produced and beg mentioning: canonical 

communality coefficients; canonical adequacy coefficient, and 
canonical redundancy coefficient. 

The canonical communality coefficients are "equal to the sum 
of the squared structure coefficients for a given variable across 
the canonical functions" (Thompson, in press-a) . In this manner 
one is able to examine how much each variable (within a set) 
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contributes to the overall understanding of the variable set 
relationships (the overall canonical solution) . 

The canonical adequacy coefficient is equal to the mean of 
the squared structure coefficients for one variable set on one 
function. The canonical adequacy coefficient indicates how well, 
on the average, a given function reproduces the variance in the 
original measured variables (Thompson, 1984) . 

The redundancy coefficient is the product of the canonical 
adequacy coefficient multiplied by the squared canonical 
correlation. The redundancy coefficient is only useful when one 
is attempting to establish concurrent validity between identical 
sets of variables, for example when one is expecting g functions, 
in which case, the redundancy coefficient will (hopefully) equal 
1.0 (Thompson, 1984). However, redundancy coefficients are not 
multivariate statistics, and are not optimized s part of the 
analysis, and thus usually have very limited utility (Cramer & 
Nicewander, 1979; Thompson, in press-a) . 

Many introductory statistics students find canonical 
correlation analysis "confusing." Thompson (1980) points out, 
"the neophyte student of canonical correlation analysis may be 
overwhelmed by the myriad coefficients which the procedure 
produces" (p. 16) . Certainly the list of coefficients covered 
thus far is indicative of this possibility. Efforts to learn the 
meanings of each of these coefficients and the proper 
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interpretations of the combinations of coefficients can indeed be 
a daunting task. 

A second source of is that researchers utilizing canonical 
correlation analysis do not consistently utilize the same 
terminology in referring to the coefficients and frequently 
interchange words and meanings. Wood and Erskine (1976) 
elucidate : 

One researcher's canonical loading becomes 
another's canonical weight; canonical dimension 
to one is a canonical variate to another; and 
canonical correlation is the relationship between 
data sets for one, but only the relationship 
between variates for another, (p. 864) 

The present study examined features of recent canonical 
reports, including substantive studies, but also measurement 
applications examining multivariate validity and multivariate 
reliability. A search of the database PsycINFO was conducted to 
identify articles published from 1988 to February, 1998. 

Portrait of Cont emporary Canonical Practices 
Little has changed in the 20 years since Wood and Erskine' s 
(1976) commentary on the confusing and somewhat arbitrary use of 
canonical terminology. The current review of literature 
reporting canonical correlation analysis yielded similar results. 
For example, structure coefficients were called by many terms, 
e.g., "correlation loadings" (Strack, 1994), "canonical loadings" 
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(Van de Geer, 1993) , and "canonical correlates" (Retzlaff & 
Bromley, 1991) . Other authors reported correlations, but did not 
specify what was being correlated. 

Sadly, many authors report only function coefficients or 
only structure coefficients. One cannot assume that function and 
structure coefficients are the same or produce similar 
interpretations of the data. Thompson (1991b) pointed out, "the 
structure and function coefficients for a variable set will be 
equal only if the variables in a set are all exactly uncorrelated 
with each other." Therefore, reporting and interpreting both the 
function and structure coefficients is necessary in studies which 
reveal correlations among variables, for a dissenting opinion, 
see Harris, 1989. 

The difficulties in the published literature are 
demonstrated through several examples of reported analyses of 
canonical results. The first example is a study by Roszkowski, 
Spreat, and Waldman (1983) . This article is a good example of 
why students have difficulty understanding canonical results. 

The authors did not use tables for reporting their results. This 
increases the difficulty in examining conclusions, and requires 
considerable conscientious effort to sort through the reported 
results. The authors also chose to utilize the terms "canonical 
components" and "loadings" to refer to the structure 
coefficients. "Loadings" is a term that has been interchanged 
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with meanings and other terms that it has lost specificity. The 
word 'loadings' is therefore neither descriptive nor helpful in 
understanding the analysis, and some journal editorial policies 
therefore now explicitly proscribe the use of this term (cf. 
Thompson, 1994) . Similar difficulties in terminology were noted 
in many articles (e.g., Adams, Lawrence, & Cook, 1979; Brush & 
Schoenfeldt, 1979; Fuqua, Seaworth, & Newman, 1987; Jelinek & 
Morf, 1995; Reid & Anderson, 1992; Strack, 1994; Tomasco, 1980) . 

In addition to confusing terminology, many authors failed to 
provide all the information necessary to evaluate their 
conclusions. Some authors chose to be overly selective by 
presenting only partial results, omitting those coefficients that 
fell below a specified criterion (e.g.. Brush & Schoenfeldt, 

1978; Gerbing & Tuley, 1991), or reporting ranges of results 
(e.g., Roszkowski, Spreat, & Waldman, 1983). This slipshod style 
of reporting prevents the reader from fully evaluating the 
reported conclusions. (A more cynical reviewer might conclude 
this was the intended purpose.) As noted previously, information 
on both standardized function coefficients and structure 
coefficients across value ranges are necessary to evaluate the 
potential influences of suppressor effects, to distinguish those 
variables which may arbitrarily not be getting predictive credit, 
to identify useless variables, and to identify perfectly 
uncorrelated variables. When coefficients are absent from the 
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reported results, the reader is unable to search for these 
potentially interesting anomalies. 

Not all of the articles were exercises in frustration.. Four 
articles stood apart from the rest by reporting and interpreting 
both function and structure coefficients (McIntosh, Mulkins, 
Pardue-Vaughn, Barnes, & Gridley, 1992; McLean, Kaufman & 
Reynolds, 1988; Reynolds, Stanton, McLean, & Kaufman, 1989; 

Sexton, McLean, Boyd, Thompson, & McCormick, 1988) . These four 
articles were refreshing. They were clear, concise, easily 
inspected for verification of reported results, and provided 
complete information on standardized function coefficients, and 
structure coefficients. Amid the muddled, incomplete efforts of 
their peers, these articles stood out as shining examples of how 
canonical articles should be presented in the literature . 

Recommended R eporting Practi ces 

Clearly, the beginning statistics student has good reason to 
be confused, not withstanding the four exceptions to the 
unfortunate rule. This confusion may be resolved, at least in 
part, by employing a set of guidelines suggested by Thompson 
(1991b) . These guidelines offer substantive and thoughtful 
suggestions for reporting and interpreting canonical results and 
are offered in five sequential steps. 

The first step is to evaluate both the squared canonical 
correlation coefficients and statistical significance test 
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results to decide which canonical functions to interpret. 
Statistical significance tests are, of course, tied to sample 
size (Cohen, 1994; Thompson, in press-b) , and most researchers 
know in advance of running their data whether the sample size was 
large enough for adequate examination. There is a problem, 
however, with most statistical packages in evaluating all of the 
canonical function coefficients. Many packages (e.g., SPSS, SAS) 
do not test each separate function. Rather, combinations of 
functions are reported with only one of the set reflecting the 
statistical significance of a single function. Additionally, 
when conducting statistical tests, the researcher must pay 
attention to the distribution of the data to evaluate the 
multivariate normality of the data. 

Second, interpret both the canonical function coefficients 
and the canonical structure coefficients on the noteworthy 
functions (Thompson, in press-a) . As mentioned earlier, it is 
vital to examine both the function coefficients and the structure 
coefficients in order to accurately interpret the results. 

Failure to inspect the canonical structure coefficients can lead 
to erroneous conclusions about the relationships of the 
variables. 

Third, (usually) do not try to interpret the redundancy 
coefficients (Thompson, 1991b, in press-a) . As mentioned 
earlier, redundancy coefficients are useful when one is 
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attempting to establish concurrent validity between identical 
sets of variables, for example when one is expecting g functions, 
in which case, the redundancy coefficient will equal 1.0. This 
use of redundancy coefficients is appropriate, but is not a 
multivariate procedure. Any other use of the redundancy 
coefficient is discouraged as one should not attempt to apply a 
univariate statistic to a multivariate interpretation. 

Fourth, one must examine the communality coefficients for 
those variables which do not contribute to the overall canonical 
correlation solution. This information may be very helpful in 
determining those measured variables which are useless and may 
possibly be omitted from the overall analysis (Thompson, 1984) . 

Finally, evaluate the generalizability of the results 
through statistical or preferably empirical means. A single 
study does not establish fact. Science requires replication and 
the extension of findings from a single study to understand 
relationships among variables. Statistical significance tests do 
mi evaluate generalizability or replicability. Procedures such 
as bootstrap, and jack-knife are appropriate techniques for 
evaluating generalizability, and though once tedious and time- 
consuming tasks, can now be more easily accomplished through 
computer programs. Thompson (1995) demonstrates the use of 
bootstrap in a canonical correlation analysis. Of course, true 
"external" replications are more serious tests of result 
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replicability. 

Chant and Dalgleish (1992) offer a SAS macro procedure for 
performing a jackknife analysis on canonical correlation and 
structure coefficients in a discriminant analysis in an effort to 
measure the standard error. Dalgleish and Chant (1995) offer a 
SAS macro procedure for performing a bootstrap analysis for the 
same coefficients. 

Measurement Applications 
Multivariate Reliability and Validity 

Reliability estimates of multivariate data are best 
calculated utilizing procedures that take into consideration the 
potential intercorrelations of the variables. Classical 
reliability theory does not consider this potentiality. 

Classical reliability theory is defined as the ratio of the true 
variance to total variance. Yarnold (1984) attempted to extend 
this classical theory to cover multivariate procedures. 
Unfortunately, Yarnold’ s solution merely averaged the univariate 
reliabilities, thus failing to account for any intercorrelations 
among the variables. 

Redundancy analysis appeared to be the next logical 
extension of the classical reliability definition to the 
mu ltivariate case (Levin, 1993) . However, redundancy analysis is 
not truly multivariate and is not sensitive to the 
intercorrelations of the variables being predicted (Cramer & 
Nicewander, 1979) . 
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Rae (1991) suggested that canonical correlation analysis 
provides a measure of multivariate reliability that honors the 
reality of the data, including potential intercorrelations of 
variables. The multivariate reliability index, or canonical 
reliability coefficient, is the average of the sguared canonical 
correlations between the observed scores and the latent true 
scores. (This is based on the earlier work of Conger and 
Lipshitz (1973), who invoked average squared Mahalonobis 
distances in calculating a canonical reliability coefficient.) 

Rae (1991) points out that when the measures comprising a 
variable set are perfectly uncorrelated, the results of the 
canonical reliability coefficient calculations will be identical 
to those using Yarnold's solution. 

Redundancy analysis, so far panned in this paper, can be a 
valuable procedure, when evaluating multivariate validity. As 
mentioned earlier, redundancy coefficients are useful when one is 
attempting to establish concurrent validity between identical 
sets of variables, for example when one is expecting g functions, 
in which case, the redundancy coefficient will equal 1.0. 

Sexton, McLean, Boyd, Thompson, and McCormick (1988) 
effectively utilized canonical correlation analysis to 
investigate the criterion-related validity of the Battelle 
Developmental Inventory (BDI ) against the Bayley Scales of Infant 
Development. The concurrent validity of the BDI was supported 
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through large redundancy coefficients, thus indicating the scales 
are tapping essentially the same constructs. 

Additional Measurement Concerns 

Concern regarding the differential influence of sampling 
error on function and structure coefficients prompted Thompson 
(1991a) to conduct a Monte Carlo study. The results indicated 
that both sets of coefficients are influenced by sampling error 
and generally to about the same degree. The use of the Wherry 
correction as an effective correction for sampling error was 
demonstrated by Thompson (1990), who noted that sampling error is 
less of a concern when researchers maintain a 10:1 ratio of 
variables to subjects. 

Liang, Krus, and Webb (1995) offered a fC-fold 
crossvalidation procedure for canonical analysis to investigate 
and ultimately reduce the sample-specific variance. They also 
noted that their method reduces the demands of the variables to 
subjects ratio. 

Conclusion 

Canonical correlation analysis is a rich analytic tool for 
examining multivariate questions. It has the reputation among 
introductory statistics students as being confusing. The present 
paper outlined some of the reasons for this confusion and 
potential solutions, through a review of recent published 
analyses. Features of measurement applications of canonical 
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correlation analysis were also reviewed. 
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Table 1 

The Confusing Language of Statistics 

(Intentionally Designed to Confuse the Graduate Students) 



Analysis 


Standardized 

Weicrhts* 


Weight 

System 


Synthetic/ 
Latent 
Variabl p f.q) 


Multiple 

Regression 


P 


"equation" 


Yhat (Y) 


Factor 

Analysis 


pattern 

coefficients 


"factor" 


factor 

scores 


Descriptive 

Discriminant 

Analysis 


standardized 

function 

coefficients 


"function" 

-or- 

"rule" 


discriminant 

function 

scores 


Canonical 

Correlation 

Analysis 


standardized 

function 

coefficients 


"function" 


canonical 

function 

scores 



Of course, the term., "standardized weight", is an obvious 
oxymoron. A given weight is a constant applied to all the scores 
of all the cases/people on the observed/manifest/measured 
variable, and therefore cannot be standardized. Instead, the 
weighting constant is applied to the measured variable in its 
standardized form, i.e., we should say "weight for the 
standardized measured variables" rather than "standardized 
weight". 
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